The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Published in Neural Information Processing Systems, 2023

The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Download paper here

Citations: 32